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ABSTRACT 

In this pilot study looking at inter language prosody, 
normal, and contras tively focused constructions in English were 
collected from four Engl ish~as~a-f irst~language speakers and four 
Japanese-as~a-f irst-language speakers. These productions were then 
played to six native English speakers to see how well they could 
identify the stress placement of the uttterances. The judgements were 
used as a diagnostic tool to study the salient characteristics of 
problems in non-native stress productions. It was found that stress 
placement was easier to recognize in native speaker tokens, although 
it was not clear what features of stress were most important to the 
judges. Possible explanations and the directions they suggest for 
further study in second language prosody are given. (Author) 
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Stress in Japanese English: 
Evidence from native perceptual judgements 



Brian D. Teaman 

University of Pennsylvania 
Department of Unguistios 



In this pilot study looking at interlanguage prosody, normal and contrastively focused 
constructions in English were collected from four LI English speakers and four L1 
Japanese speakers. These productions were then played to six native English speakers 
to see how well they could identify the stress placement of the utterances. The 
judgements were used as a diagnostic tool to study the salient characteristics of problems 
in non-native stress productions. It was found that stress placement was easier to 
recognize in native speaker tokens, although it was not dear what features of stress were 
most important to the judges. Possible explanations and the directions they suggest for 
further study in second language prosody are given. 



introduction 

The study of prosody is a relatively untouched aspect of second language 
acquisition (SLA) research. There have been few studies of prosodic development or 
descriptions of prosody in second language iearningJ This lack of analysis continues, 
in spite of the ^^uggestion that prosody might be more important than segmental effects 
in determining L2 comprehensibility (Gilbert, 1990). This is a very strong claim and 
will not be possible to test without first understanding interlanguage prosody in a clear 
way. One important aspect of prosody in English is stress. Not only does stress help 
identify and locate words, but the main stress of a phrase acts as a locus for 
intonational contours. This study will look at stress of two different types, nomial and 
contrastive, in order to begin to understand how well stress variability is controlled by 
4^ Japanese speakers of English. 
VO This study was motivated by casual observations in the classroom that 

^ Japanese English speech often seems to be characterized by a relatively level pitch 
p{ with high rise-falls on the nucleus of the intonational phrase. The intonational 

er|c ^Bumhwrnn 
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contours of these speakers often seena inappropriately insistent. Although there seem 
to be certain characteristics of the pitch contour that determine the oddness of these 
types of sentences, sound properties of the intonational phrase play only a part; 
pragmatic differences also have an effect. An example of this is shown in (1). These 
intonational contours^ seem perfectly normai on their own. The problem is their 
juxtaposition. To put two declarative intonations of roughly the same contour on this 
phrase seems inappropriate. The focusing of both "awareness" and Issues" suggests 
different pragmatic intentions if only one intonation peak were found on either element. 
Explaining characteristics of interlanguage intonation involves not only an 
understanding of its characteristic intonation and stress, but also a consideration of 
pragmatic concerns. 

(1) 

\_ V 

They have a certain awrreness of envirnomental issues 



I will first look at the realization of sentence stress in different pragmatic 
contexts. The sentence "I have a red dog" has a standard declarative pitch contour as 
shown in (2a). In (2b) a contrastively focused counterpart is shown. 

(2) 

/V _^\_ 

a. I have a red DOG b. I have a RED dog 

The way that contrasting elements are highlighted in standard English is to simply 
move the nucleus of the intonation contour from the normal position ("dog" in this case) 
to the contrasted element ("red"). Some background in stress and intonation in 
Japanese and English is necessary before discussing the current study and results. 

Accent and intonation in English and Japanese 

While contrastive analysis and error analysis have fallen into disfavor as explanatory 
theories in SLA, the influence of transfer cannot be disregarded (loup, 1984). I argue 
that due to the complexity of accent and intonation coupled with the inherent variability 
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of Interlanguage systems, a thorough understanding of the relevant LI and L2 systems 
will not only be helpful, but it is necessary. The most logical beginning point for 
understanding characteristics oi the interlanguage data is to first look at stress and 
intonation in the LI and the target language. 

Japanese and English prosody have been the subject of many recent studies 
which provide a thorough descriptive and theoretical basis with which to approach 
second language data.^ Beckman (1986:5) distinguishes English as a prototypical 
stress-accent language and Japanese as a prototypical non-stress-accent language. 
Van der Hulst and Smith (1988) use the term "pitch-accent" to describe Japanese. 
Though the form differs for both languages, accent functions to mark certain syllables 
distinctively-distinguishing words like c6ntent and content in English, and 
distinguishing the Japanese words h^shi 'chopsticks', hashf 'bridge,' and hashi 'edge.' 
Stress also functions to delimit words in a string just as the verb and noun are 
stressed in the English sentence 'he drAnk a c6ffee.' In Japanese, accent is not such a 
reliable indicator in marking words in a string, since it is possible to have extended 
stretches containing no pitch accent (compare the pitch-accented 8a, b and c with the 
unaccented counterpart in 13, noting the expended high-pitched, unaccented portion of 
13). Japanese and English are similar in that they have lexical accent, which means 
that accent cannot be predicted, it must be specified for each word.^ 

Perceptually, there are three aspects to stress-accent in English: pitch, length 
and loudness. Fry (1958) tested these three parameters by eliciting judgements of 
synthesized tokens while carefully varying pitch, length and loudness. Pitch was 
found to be the most perceptually significant variable, followed by duration and then 
loudness. In spite of the significance of pitch. Lea (1977) found that an algorithm for 
recognizing stress worked best if length and volume were also taken into account. 

Current analyses of English intonation posit a connection between stress as 
discussed above and the intonational phrase. The most accentually prominent point-a 
stressed position-can then be used to "hang" one of a selected number of intonation 
contours. The metrical structure of words and/or phrases can be added to intontational 
contours by matching the in the metrical string of (3) to the * of the intonational 
contour shown in (4). The "(L)H*+M" contour with an optional Low tone associated 
with the onset and a High fall to Mid-level could be added to these simple structures to 
yield the same basic elongated call.^ 
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(3) 

Abemathy A li da 

(4) (L) +M (Calling contour) 

In this case, the would be linked to the stressed syllable and the M tone would link 
with the coda syllables. In Abernathy, "A" would be linked with the high tone, while the 
other syllables would be associated with an M tone. Note that this word has no onset, 
so there Is no nnaterial to align with the optional low. "Alicia," on the other hand, would 
get the optional L tone placed on the onset "A," while "11" would get a H tone and "cia," 
as the coda, would be associated with the M. Example (5) shows the resulting 
intonation contours that would result fronn the connbination of the metrical structures 
represented in (3), with the contour of (4). Note that the difference between 5a and 5b 
is simply the existence of an onset to the nucleus in ''Alicia" while ^Abernathy" begins 
with an accented syllable. 




a. A ber nathy b. A li da 



To understand English pitch as it is realized in phrases, it is important to keep 
separate the two different levels: stress assignment and the intonational contour. This 
understanding of the independence of stress and intonation as well as their interaction 
is crucial to understanding English prosody. 

Japanese works differently. The shape of a pitch contour is determined by the 
pitch accent of individual words with phonological rules and focus constraints create a 
contour of the lexical pitch accents. While it is a simplification of the processes, it is 
easy to substantiate the claim that the pitch properties of individual words contribute to 
the overall intonation of phrases in ways that are not possible in English. 

Tc demonstrate how the pitch properties of words determine different possible 
intonational phrases, I will refer to an example given by Kurasawa-Williams (1992). 
The basic phrase is shown in sentence (6). 

(6) Yon6moto-san no 02y6o-tyan 

Yonemoto-HON GEN woman-DIM 
[Mr. Yonemoto's daughter] 
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The Japanese phrase has pitch accents on two words in this phrase (indicated by the 
accent nnarks). Different kinds of focus allow for three different pitch patterns 
associated with (6). The questions 7a, 7b and 7c, force focusings that yield three 
different intonational phrases shown in 8a, 8b and 8c. 



(7) a. Was it Mr. Yonemoto*s son or daughter who called? 

b. Whose daughter is coming? 

c. Who is conning? 



(8) 



-1 



a. Yon6moto-san no ozy6otyan 



b. Yon6moto-san no ozy6otyan 

n 



c. Yonemoto-san no ozy6otyan 



English only allows two patterns for one intonational phrase. So, in English, 9a would 
be an answer to both 7a and 7b. While 9b would be an answer to 7c. 

(9) 



a. Mr. Yonemoto's daughter 

A 

b. Mr. Yonemoto's daughter 



Therefore, Japanese allows up to three different possible intonational phrases for two 
pitch-accented words^ while English allows only two possibilities for two stress- 
accented words in the same pragmatic context. This difference is due to the fact that 
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pitch is an inherent property of words in Japanese and rules of phonological focus as 
shown here depend crucially on the pitch-accent properties of the words in the 
sentence. In English, there is only the option of moving the nucleus of the pitch 
contour from ''daughter'' to ^'Yonemoto." 

Now that a basic sketch of Japanese and English accent and intonation has 
been discussed, a few general statements concerning the differences between these 
two languages can be delineated. 

(10) 

accentless words: Japanese has accentless words, English does not. 

stress accent: Accent in English involves pitch, loudness and length while 
Japanese uses only pitch (Beckman 1386). 

pitch accent: In English, the intonational phrase is made by applying a pitch 
contour to a metrically marked phrase while in Japanese the pitch 
properties of words contribute more to the realization of pitch. 

How then, do these differences pose problems for Japanese speakers learning - 
English? The existence of accentless words would seem to be a greater problem for 
English speakers learning Japanese than for Japanese speakers learning English. 
For Japanese speakers, English represents fewer possible stress types since English 
words are free be accented on any syllable, but must be accented on at least one, 
while Japanese words are free to have no accent at all. 

Stress accent would seem to be a more serious problem for Japanese learners 
of English, since this division of labor is quite foreign to these learners. A Japanese 
learner of English is faced with a language that relates any one of a number of 
intonational contours to the same metrically marked string, whereas Japanese has 
constraints on pitch contours that are derivable only by understanding the underlying 
pitch properties of words. 

The third aspect, a different phonetic realization of accent, seems as though it 
would be extremely difficult to acquire. English and Japanese both have abstract 
accent but this accent is realized quite differently. As Lea (1977) has shown, native 
speakers of English manipulate the three parameters of pitch accent in complicated 
ways to mark stressed words in context. Not only are the relationships between pitch 
and intensity potentially complicated, duration alone seems almost intractably 
complex. Van Santen and Olive (1990) have attempted to derive an algorithm which 
would model duration in English vowels. They look at several factors that contribute to 
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vowel length such as vowel type, cx^nsonantal context and position in a phrase and 
find extremely complex ^teractions between the different variables. That the LI 
learner has the capacity for reproducing these variants is amazing in itself. 

The study 

The research that will be described in this section will first look at how native 
speakers perceive native and non-native stress. These perceptions will be used as a 
basis for a discussion of the interlanguage English productions. There were two major 
phases to the pilot study: a data-collection phase and a judgement phase. The data- 
collection phase involved an activity used to generate natural speech with different 
types of normal and contrastive stress. The judgement phase used the data generated 
in the first part to elicit native-speaker judgements of English stress-accent. The 
judgements were used as a diagnostic tool to determine the salient characteristics of 
problems in non-native stress productions. 

Data Collection 
Subjects 

Volunteers were solicited from students of English as a second language and 
Japanese as a second language at the University of Pennsylvania. Four native 
Japa.nese speakers (NJS) and four native English speakers (NES) participated. All 
participants were college graduates and had some abilities in both languages. The 
researcher, who was trained in the ACTFL oral proficiency rating system, estimated 
that the level of the NJSs ranged from Intermediate-mid" to ''advanced-high." Two of 
the NJSs had been in the U.S. for more than two years and two had been in the U.S. 
for less than a year. Impressionistically, the two with the most time in the U.S. were 
more proficient than the other two. 

Materials and Procedure 

Materials were prepared in order to elicit noun phrases with pitch-accents 
located in normal and non-normal (i.e. exceptionally focused) positions. Pictures were 
used instead of written material so that natural speech could be elicited. The task 
allowed speakers to use language to solve a problem, not just language for 
language's sake. Reading sentences would have failed to control adequately for 
communicative goals. 
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Two pairs of picture cards were prepared. Each card had a 4x4 grid with 10 
pictures and 6 blank spaces. The blank spaces were added to make it nnore difficult 
for the participants to go through the exercise in a ^'list-like" repetition of phrases, with 
none of the variation in stress desired. Participants were i^^aced in pairs and asked to 
orally connpare the pictures on their card with the pictures on their partner's card. Pairs 
consisting of all possible combinations were used. There were two NES-NES pairs, 
two NJS-NJS pairs, and four NES-NJS pairs. After an item was compared, each 
participant was asked to determine whether there were 0, 1 or 2 differences between 
the picture on their own card and the picture on their partner's card. Each participant 
recorded the number of differences on a separate piece of paper. For example, if the 
first person had a picture of a red dog as sh'Wn in 11, the partner could have any 
number of differences in their picture such as those represented by sentences 12a- 
12d. 



(11) 
(12) 



I have a red dog. 

a. I have a BLUE dog 

b. I have a red CAT 

c. I have a red dog (too) 

d. I have a BLUE CAT 



1 difference (adjective) 

1 difference (noun) 
0 differences 

2 differences (adjective and noun) 



These are only sample sentences; participants were free to complete the exercise 
using any phrasing they chose. A professional quality cassette recorder and stereo 
microphone were used for record ing.^ 

Native Speaker Judgements 
Subjects 

Six graduate students in Linguistics volunteered to make judgements. These 
subjects were asked to participate because it was thought that they would be more 
consistent in making judgements of the sort required. Researchers have found that it 
is difficult to get reliable results on stress judgements.^ 

Materials and Procedures 

The recordings of the data were digitized and tokens were segmented from the 
digitized speech. Some *Adj + Noun" phrases elicited were not used because they 
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overlapped other speech or were barely audible. The final number of tokens 
segmented for this experiment was 153 including 73 native and 80 non-native 
segments. These tokens were segmented within the Intonatlonal phrase in which they 
appeared such as: "I have a red dog," "next, a red dog," or "red dog." Judges listened 
to the tokens through headphones as a computer program randomly chose one of the 
153 tokens. They were asked to make one of three judgements for each token: 
whether stress was on the noun, on the adjective, or even. The judges were allowed 
to listen to the token up to five times. It was thought, that by allowing for repetitions, the 
judges would make more reliable judgemenis. However, they were encouraged to 
make their judgments in as few listens as possible. If each subject used only the 
number of repetitions actually needed to make a judgement, this number of repetitions 
could be used to provide a measure of the difficulty of judging the stress in that token. 
Information was stored on number of listens as well as on which stress type was 
judged. 



Repetitions 

The number of repetitions needed varied from one to three; therefore, the 
maximum of five listens was not needed. In Table 1 , the mean number of repetetions 
needed to make a judgement are listed. The numbers clearly demonstrate that judges 
needed to listen to non-native tokens more than native tokens. The first column, the 
mean number of listens, shows that there were between 1.0 to 2.0, mean listens. 
There were many tokens that only needed to be listened to once. For the native 
tokens, 40% were not repeated by any of the judges, while only 24% of the non-native 
tokens were unrepeated. These repetitions might reflect the difficulty of perceiving 
stress, however, other factors might also be Involved-such as difficulty in overall 
comprehension due to segmental effects. 

Table 1 Number of tokens needed to make a judgement 



Results 



Mean 

1.0 

1.2 

1.3 

1.0 

1.6 

1.8 

2.0 
TOTAL 



native tokens 



30 
25 
09 
09 
0 
0 

73 



non-native tokens 



19 
22 
19 
12 
04 
03 
01 
80 
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Agreement among judges 

Amount of agreement among the judges is a better measure of difficulty in 
judging stress, because it directly addresses the question of stress and n^t just overall 
difficulty in comprehension of the token. Agreement was judged "high" if at least four 
out of the six judges^ agreed on the token as either noun-stressed, adjecive-stressed, 
or even-stressed. The results are shown in Table 2. 

Table 2 High agreement tokens 
native non-native 



54/73 74% 44/80 55% 



Native speech samples elicited a much higher agreement among the judges. !t is 
clear that there is a definite L1 effect. Table 3 rank orders the eight speakers by 
language in respect to the number of high agreement tokens. In this example, 
American English speakers are represented by NES1, NES2, NES3 and NES4 and 
Japanese speakers by NJS1 , NJS2, NJS3 and NJS4. 



Table 3 High agreement tokens: By individual speakers 



Subject High Agreement/Total Percent 

NES1 16/18 89 

NES2 13/17 76 

NES3 13/19 68 

NES4 12/19 63 

NJS1 12/19 63 

NJS2 12/20 60 

NJS3 11/19 58 

NJS4 9/22 41 



There was a much higher tendency for agreement on the NES tokens than the NJS 
tokens. The native speaker "NES1" had 18 total tokens played to judges and 16 
tokens were agreed on by five or more of the judges. NJS4 only produced 9 of 22 
high agreement tokens. It is clear that there is an LI effect, since the non-native 
speakers were never above 63%. However, there was a lot of varrability in 
judgements of native speech; considering native vs. non-native speech does not 
totally account for this variability. 

Table 4 below gives more information by showing not only the high tokens, but 
also the non-high tokens. 
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Table 4 Agreement among judges for different tokens (by L1) 



Judges agreeing 
No. 



English L1 subjects 



No. 


% 


19 


26 


19 


26 


16 


22 


17 


23 


02 


03 



Japanese 


L1 subjects 


No. 


% 


10 


13 


18 


23 


16 


20 


34 


43 


02 


03 



Although Table 2 shows clear differences in the perception of native and non-native 
speech, Table 4 shows it more clearly. The number of tokens where only three judges 
agreed was 23% for the natives and 43% for the non-native tokens. 



Discussion 



There are some general comments to be made regarding the results discussed 
above. First, Japanese learners are capable of producing stress forms that are 
categorically perceived by native listeners. All tokens used for the judgement test 
were not categorically perceived for NESs or NJSs. Secondly, the NJS subjects who 
participated in this study showed what might be developmental effects. The more 
advanced learners produced forms that were more often agreed upon by the native 
judges. This potential for development, or at least differential competence in stress 
would indicate that this is a variable worthy of further study. 

There are clear indications of differences in how tokens produced by different 
speakers were perceived, but nothing has yet been said about what the cause for 
these differences in stress judgements might be. Why did the NJSs perfomn at a much 
lower level overall, compared to the NESs? In order to ex^^.jre the possible sources 
for differences, I will first begin by discussing the three problem areas discussed above 
in 10. There seemed to be no evidence that learners interpreted English words as 
being accentiess. This possibility should not be ruled out, however, since this was a 
small group of learners, none of whom were in the earliest stages of learning English. 
Learners of lower proficiency levels, different learners, or these same learners under 
different circumstances might do so. I have obsen/ed in casual situations that 
sentences are sometimes ended with no fall which would seem to indicate that the 
speaker was behaving as if there were no accent on the final word, where there 
should have been. 
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The second difference listed in (10), stress accent, probably has some effect, 
and might be a variable that is not manipulated properly. All of the syllable nuclei 
were measured for all of the tokens. It seemed that the NJSs varied little in the length 
of the syllable nucleus whereas the NESs lengthened stressed syllables to a greater 
extent. The database was not oontroiied for in terms of syllable shape to an extent 
great enough to warrant broad conclusions about length. An experiment that controls 
more for syllable shape will be necessary to explore this difference more fully. 

The final difference listed in (10), pitch accent, seemed to have a great effect. 
Although it is not possible to formalize the demonstrated effects at this point, it seems 
clear that certain phonological rules of "pitch-accent" are operating on the English 
productions of these speakers. (8c) shows two pitch accented words in Japanese in a 
neutral context (no special focus on either term). If we look at the same phrase but 
with accent only on ozydo-tyan, the result is a hat-shaped intonational pattern (13). 
This type of phrasing was observed on many tokens. 



In summary, each of the three major differences related in (10) seem to yield 
some interesting possibilities, but more work is needed to understand the 
characteristics of interlanguage stress productions by Japanese speakers of English. 
Further studies will be needed with larger numbers controlling for different levels of 
proficiency. For each of the different hypotheses, a data-set will need to be 
constructed that will explore the specific variables in a more complete way. 



* A few of the more significant studies of second language prosody have been Backman (1977), Sethi 
(1 982), Cruz-Ferreira (1 980; 1 987; 1 989). and Berkovits (1 980). 

^ The contours used in this paper are stylized, they do not represent certain phonetic factors such as 
downdrift and segmental effects since they are meant to represent only the phonologically significant 
aspects of the intonational contour. 

^ Some of the representative examples of approaches to English are Liberman (1975) and Liberman and 
Pierrehumbert (1984). For Japanese, S3e Poser (1984), Pierrehumbert and Beckman (1988), and 
Haraguchi (1988). Beckman and Pierrehurubert (1986) and Beckman (1986) compare Japanese and 
English. 



(13) 




a. Hiraoka-san no ozy6otyan 
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^ Here, I mean only to contrast English and Japanese with other languages where accent is much more 
predictable, and with very few exceptions. Autosegmental analyses have been offered for English in 
Halle and Verspaud (1 989) among nnany others. For Japanese, a comparable analysis has been given by 
Haraguchi (1 991), These analyses show that accent is much more predictable than seems to be possible 
on a superficial level, however, the fact remains that the system is much less predictable with Japanese 
where there are many minimal pairs lite the h^shi -hash! - hashi paradigm as shown above. English has 
fewer examples, with r6efer- ref6r, dfffer-def6r, and p6rvert-perv6rt being differentiated solely by stress 
for many speakers (Flege &Ocke-Schwen, 1989). 

^ The basic line of this argument comes from Liber man (1975) however there is some influence in the 
notation used by Pierrehumbert (1 980). Fa the purposes of this paper, I have deviated from both of their 
systems in order to simplify the argument; it does not weaken the argument. 

^ For all of the possible permutations of accented and unaccented nouns and adjectives in Japanese, 9 
different pitch contours are shown in Kurasawa-Wllliams (1 992) and Teaman (1992). 

^ Central air conditioning caused an audible hum on the tape which made the quality of the tapes of less 
than optimal quality, though they were good enough to proceed with further analysis. 

^ (See Lea [1 977] for an extensive discussion of methodology in stress judgements). 
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